NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

System and Design Technology Co-Optimization of SOT-MRAM for High-Performance AI Accelerator Memory System

https://doi.org/10.1109/TCAD.2023.3333754

Mishty, Kaniz; Sadi, Mehdi (April 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

System on chips (SoCs) are now designed with their own artificial intelligence (AI) accelerator segment to accommodate the ever-increasing demand of deep learning (DL) applications. With powerful multiply and accumulate (MAC) engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and inference. In this work, we propose a memory system with high on-chip capacity and bandwidth to shift the gear of AI accelerators from memory-bound to achieving system-level peak performance. We develop the memory system with design technology co-optimization (DTCO)-enabled customized spin-orbit torque (SOT)-MRAM as large on-chip memory through system technology co-optimization (STCO) and detailed characterization of the DL workloads. Our workload-aware memory system achieves 8× energy and 9× latency improvement on computer vision (CV) benchmarks in training and 8× energy and 4.5× latency improvement on natural language processing (NLP) benchmarks in training while consuming only around 50% of SRAM area at iso-capacity.
more » « less
Full Text Available
Chiplet-Gym: Optimizing Chiplet-Based AI Accelerator Design With Reinforcement Learning

https://doi.org/10.1109/TC.2024.3457740

Mishty, Kaniz; Sadi, Mehdi (January 2025, IEEE Transactions on Computers)

Not AvailableModern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves 1.52× throughput, 0.27× energy, and 0.89× cost of its monolithic counterpart at iso-area.
more » « less
Full Text Available
System and Design Technology Co-optimization of Chiplet-based AI Accelerator with Machine Learning

https://doi.org/10.1145/3583781.3590233

Mishty, Kaniz; Sadi, Mehdi (June 2023, Proceedings of the Great Lakes Symposium on VLSI)

With the availability of advanced packaging technology and its attractive features, the chiplet-based architecture has gained traction among chip designers. The large design space and the lack of system and package-level co-design methods make it difficult for the designers to create the optimum design choices. In this research, considering the colossal design space of advanced packaging technologies, resource allocation, and chiplet placement, we design an optimizer that looks for the design choices that maximize the Power, Performance, and Area (PPA) and minimize the cost of the chiplet-based AI accelerator. Inspired by the Bayesian approach for black-box function optimization, our optimizer guides the search space toward global maxima instead of randomly traversing through the search space. We analytically synthesize a dataset from the search space and train an ML model to predict the target value of our defined cost function at the optimizer-suggested points. The optimizer locates the optimum design choices from the specified search space (≥ 1M data points) with minimal iterations (≤ 200 iterations) and trivial run time.
more » « less
Full Text Available
Designing Efficient and High-Performance AI Accelerators With Customized STT-MRAM

https://doi.org/10.1109/TVLSI.2021.3105958

Mishty, Kaniz; Sadi, Mehdi (October 2021, IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

Full Text Available
Special Session: On the Reliability of Conventional and Quantum Neural Network Hardware

https://doi.org/10.1109/VTS52500.2021.9794194

Sadi, Mehdi; He, Yi; Li, Yanjing; Alam, Mahabubul; Kundu, Satwik; Ghosh, Swaroop; Bahrami, Javad; Karimi, Naghmeh (April 2022, VLSI Test Symposium)

Full Text Available
Effects of Temperature and Structural Geometries on a Skyrmion Logic Gate

https://doi.org/10.1109/TED.2021.3130217

Tang, Chunli; Alahmed, Laith; Xu, Jihao; Shen, Maokang; Jones, Nicholas Alex; Sadi, Mehdi; Guin, Ujjwal; Zhao, Wenfeng; Li, Peng (April 2022, IEEE Transactions on Electron Devices)

Full Text Available
Special Session: Reliability Analysis for AI/ML Hardware

https://doi.org/10.1109/VTS50974.2021.9441050

Kundu, Shamik; Basu, Kanad; Sadi, Mehdi; Titirsha, Twisha; Song, Shihao; Das, Anup; Guin, Ujjwal (April 2021, 2021 IEEE 39th VLSI Test Symposium (VTS))

Full Text Available

Search for: All records